30 research outputs found

    Unbiased protein association study on the public human proteome reveals biological connections between co-occurring protein pairs

    Get PDF
    Mass-spectrometry-based, high-throughput proteomics experiments produce large amounts of data. While typically acquired to answer specific biological questions, these data can also be reused in orthogonal wayS to reveal new biological knowledge. We here present a novel method for such orthogonal data reuse of public proteomics data. Our method elucidates biological relationships between proteins based on the co-occurrence of these proteins across human experiments in the PRIDE database. The majority of the significantly co-occurring protein pairs that were detected by our method have been successfully mapped to existing biological knowledge. The validity of our novel method is substantiated by the extremely few pairs that can be mapped to existing knowledge based on random associations between the same set of proteins. Moreover, using literature searches and the STRING database, we were able to derive meaningful biological associations for unannotated protein pairs that were detected using our method, further illustrating that as-yet unknown associations present highly interesting targets for follow-up analysis

    sORFs.org : a repository of small ORFs identified by ribosome profiling

    Get PDF
    With the advent of ribosome profiling, a next generation sequencing technique providing a ‘snap-shot’ of translated mRNA in a cell, many short open reading frames (sORFs) with were identified. Follow-up studies revealed the existence of functional peptides, so-called micropeptides, translated from these ‘sORFs’, indicating a new class of bio-active peptides. Over the last few years, several micropeptides exhibiting important cellular functions were discovered. However, ribosome occupancy does not necessarily imply an actual function of the translated peptide, leading to the development of various tools assessing the coding potential of sORFs. Here, we introduce sORFs.org (http://www.sorfs.org), a novel database for sORFs identified using ribosome profiling. Starting from ribosome profiling, sORFs.org identifies sORFs, incorporates state-of-the-art tools and metrics and stores results in a public database. Two query interfaces are provided, a default one enabling quick lookup of sORFs and a BioMart interface providing advanced query and export possibilities. At present, sORFs.org harbors 263 354 sORFs that demonstrate ribosome occupancy, originating from three different cell lines: HCT116 (human), E14_mESC (mouse) and S2 (fruit fly). sORFs.org aims to provide an extensive sORFs database accessible to researchers with limited bioinformatics knowledge, thus enabling easy integration into personal projects

    Distributed computing and data storage in proteomics: many hands make light work, and a stronger memory

    Get PDF
    Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.acceptedVersio

    Pladipus enables universal distributed computing in proteomics bioinformatics

    Get PDF
    The use of proteomics bioinformatics substantially contributes to an improved understanding of proteomes, but this novel and in-depth knowledge comes at the cost of increased computational complexity. Parallelization across multiple computers, a strategy termed distributed computing, can be used to handle this increased complexity; however, setting up and maintaining a distributed computing infrastructure requires resources and skills that are not readily available to most research groups. Here we propose a free and open -source framework named Pladipus that greatly facilitates the establishment of distributed computing networks for proteomics bioinformatics tools. Pladipus is straightforward to install and operate thanks to its user-friendly graphical interface, allowing complex bioinformatics tasks to be run easily on a network instead of a single computer. As a result, any researcher can benefit from the increased computational efficiency provided by distributed computing, hence empowering them to tackle more complex bioinformatics challenges. Notably, it enables any research group to perform large-scale reprocessing of publicly available proteomics data, thus supporting the scientific community in mining these data for novel discoveries

    Anatomy and evolution of database search engines — a central component of mass spectrometry based proteomic workflows

    Get PDF
    Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines.acceptedVersio

    An update on LNCipedia : a database for annotated human lncRNA sequences

    Get PDF
    The human genome is pervasively transcribed, producing thousands of non-coding RNA transcripts. The majority of these transcripts are long non-coding RNAs (lncRNAs) and novel lncRNA genes are being identified at rapid pace. To streamline these efforts, we created LNCipedia, an online repository of lncRNA transcripts and annotation. Here, we present LNCipedia 3.0 (http://www.lncipedia.org), the latest version of the publicly available human lncRNA database. Compared to the previous version of LNCipedia, the database grew over five times in size, gaining over 90 000 new lncRNA transcripts. Assessment of the protein-coding potential of LNCipedia entries is improved with state-of-the art methods that include large-scale reprocessing of publicly available proteomics data. As a result, a high-confidence set of lncRNA transcripts with low coding potential is defined and made available for download. In addition, a tool to assess lncRNA gene conservation between human, mouse and zebrafish has been implemented

    LNCipedia 5 : towards a reference set of human long non-coding RNAs

    Get PDF
    While long non-coding RNA (lncRNA) research in the past has primarily focused on the discovery of novel genes, today it has shifted towards functional annotation of this large class of genes. With thousands of lncRNA studies published every year, the current challenge lies in keeping track of which lncRNAs are functionally described. This is further complicated by the fact that lncRNA nomenclature is not straightforward and lncRNA annotation is scattered across different resources with their own quality metrics and definition of a lncRNA. To overcome this issue, large scale curation and annotation is needed. Here, we present the fifth release of the human lncRNA database LNCipedia (https://lncipedia.org). The most notable improvements include manual literature curation of 2482 lncRNA articles and the use of official gene symbols when available. In addition, an improved filtering pipeline results in a higher quality reference lncRNA gene set

    Exploring the potential of public proteomics data

    Get PDF
    In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS‐based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re‐)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data.publishedVersio

    Exploring the potential of public proteomics data

    Get PDF
    In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS-based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re-)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data

    Noncoding after all : biases in proteomics data do not explain observed absence of lncRNA translation products

    Get PDF
    Over the past decade, long noncoding RNAs (lncRNAs) have emerged as novel functional entities of the eukaryotic genome. However, the scientific community remains divided over the amount of true noncoding transcripts among the large number of unannotated transcripts identified by recent large scale and deep RNA-sequencing efforts. Here, we systematically exclude possible technical reasons underlying the absence of lncRNA-encoded proteins in mass spectrometry data sets, strongly suggesting that the large majority of lncRNAs is indeed not translated
    corecore